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Abstract 

In this paper, we develop a new ehained equipereentile equating proeedure for the nonequivalent 
groups with anehor test (NEAT) design under the assumptions of the classieal test theory model. 
This new equating is named ehained true seore equipereentile equating. We also apply the kernel 
equating framework to this equating design, resulting in a family of ehained true seore 
equipereentile equating funetions, whieh include the Levine true score equating model as a 
special case. 

Key words: NEAT design, chained true score equipereentile equating (CTSEE), Levine true 
score equating (ELSE), kernel equating (KE) 
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Introduction 



Non-IRT equating has been evolving for more than 80 years. At ETS, for a nonequivalent 
groups anehor test (NEAT) design, commonly used equating methods are linear and 
equipercentile chained equating (CE), linear and equipercentile post-stratification equating 
(PSE), the Tucker method, and the Eevine true and Eevine observed score methods (Angoff, 
1982; Braun & Holland, 1982). Only the equipercentile CE and equipercentile PSE are 
nonlinear. 

Kernel equating (KE) methodology (von Davier, Holland, & Thayer, 2004) gives a 
systematic treatment of several commonly used equating designs. One of its main properties is 
that it is able to connect a linear method with a nonlinear method by changing the values of the 
parameter known as the bandwidth. 

Eevine true score equating (ETSE) and Eevine observed score equating (EOSE; see 
Kolen & Brennan, 2004), which have not been discussed to date in regard to the KE framework, 
have some favorable properties under certain circumstances. Recently, some work has been done 
to find nonlinear versions of EOSE, such as a hybrid Eevine equipercentile equating (von Davier, 
Eournier-Zajac, & Holland, 2006), and a modified post-stratification equating (Wang & Brennan, 
2007). 

This paper constructs a new nonlinear equating method called chained true score 
equipercentile equating (CTSEE), based on the classical test theory model. Elsing the KE 
framework, we will construct a family of CTSEE functions and show that ETSE is a special 
member of this family. 

There are six sections in this paper. After the introduction, the next section reviews the 
chained equating and Eevine true score equating. The third section discusses KE process steps 
and their properties, and the fourth covers the construction of CTSEE. The fifth section has 
examples and discussion, and with the last section is the conclusion. 

Review of Chained Equating and Levine True Score Equating 

CE is a classical method that applies to a NEAT design (Angoff, 1971; Dorans, 1990; 
Eivingston, Dorans, & Wright, 1990). Eor tests A and Y with anchor H, and populations F and Q 
taking A and Y, respectively, the chained equipercentile from A to T is the composition of two 
equipercentile equatings from A to H with population F and from A to Y with population Q. If we 
assume that for each test the marginal distributions of all involved test scores are of similar 



1 




shapes, then eaeh equipercentile equating can be approximated by a suitable linear equating (von 
Davier et ah, 2004, p. 12, Theorem 1.1.). Hence, the resulting chained equipercentile equating 
can be approximated by a suitable linear equating called a chained linear equating. 

LTSE uses true scores to equate X to Y. By the classical test theory model (see Feldt & 
Brennan, 1989): 



X = Tx + Ex, 



where Tx is the true scores associated with X, and Ex is the errors with zero mean and zero 

2 2 2 

correlation with the true scores (i.e., jix = jJ-Tx and a x = <7 tx + <y ex)- The correlation of X with Tx, 
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is the square root of the reliability of test X (see Eord & Novick, 1968). Similar results also hold 
iox A p (A with population E), Y and Aq. For convenience, for any univariate distribution X: 



Px 




( 1 ) 



One assumption of ETSE is that the true scores of the test and its anchor are perfectly correlated, 
which makes linking Xto Hp in true score form a linear function that has this form: 
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Similarly, there is a linear linking firomHg to Y: 
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Under the assumptions that (2) and (3) are population invariant, substitute (2) into (3) to get the 
formula for ETSE: 
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j = LTSE(x) = 



O' rp 



O' rp 






‘Ag 



<J^ (Jt 



+ -MAg) + MY 



O't- O't^ O't’ 

^Ap / \ . _ Ty 






^Ag 



CT rp 






( 4 ) 



Another assumption for Levine’s methods is that error variances are invariant among all 
populations. This is used, along with the linearity between a test and its anchor on true scores, to 
calculate Gj. (hence p,T. and p,) for any distribution. See Brennan (1990) for detailed 
computations. 

Review of Kernel Equating Procedures Applied to Chained Equating 

KE, first proposed by Holland and Thayer (1989) and later fully developed by von Davier 
et al. (2004), gives a systematic treatment for many well-known equating designs to derive the 
equating functions. It consists of five basic steps: 

1 . Presmoothing the score probabilities by fitting a log-linear model. This step can be 
omitted or modified by using alternative models. 

2. Estimating the score probabilities. This step estimates the score distributions for both 
test X and test Y, denoted as r and s, respectively. The equating design will play a 
crucial role in this step. 

3. Continuizing r and s. Use kernel techniques to make continuized density functions 
and the related continuized distribution functions (CDEs) from the discrete density 
distributions created in Step 2. This is the unique step that defines KE. The kernel 
used here is the normal distribution function, also known as the Gaussian kernel. 

4. Computing the equating. This step computes the equating function by composing two 
or more CDEs made in Step 3. 

5. Calculating the standard error of equating. This step use C-matrices either generated 
in Step 1 or calculated within KE for nonpresmoothed data (Moses & Holland, 2006). 
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The book The Kernel Method of Test Equating (von Davier et al., 2004) explicitly details 
the five steps applied to a chained equating. This paper deals with only Steps 2-4 since the other 
steps are not relevant to the topic of this paper. 

Estimation of the Score Probabilities for Chained Equating 

Given tests X and Y with common items known as anchor ^4, two bivariate score 
distributions are obtained. Then the marginal distributions are estimated for X and its anchor 
from the first data set and are denoted as X and Ap, respectively. Similarly, marginal distributions 
Y and Aq are obtained from the second data set. 

Continuization of the Marginal Distributions 

Unlike other designs, CE needs four continuized score densities. The details on X are 
given below. 

Let {{xi, fi)} be the marginal distribution of X with probability r, for each score Xp Let 
and Gx be the mean and standard deviation ofX, respectively. Lor any positive number hx, called 
the bandwidth, is defined as; 



cr 






X 






( 5 ) 



and T) = a^hx. 

Then for each x„ Rijfc) is defined as 



R,x{x) = 



X- !Ux- a xi^i - Ex) 

rj 



and the continuized distribution function (CDL) of X with bandwidth /i^is: 



( 6 ) 



(V) 



where O is the CDL of the standard normal function. 

It is convenient to define a new random variable X{hx) to study the properties of Fhx{x). 
Let Lbe independent of X with the standard normal distribution, with given hx. 
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( 8 ) 



It is shown in von Davier et al. (2004) that 

^/,xW = Prob(X(A^)<x). 



( 9 ) 



Some properties of X{h)^ are: 

1. \imX{h,) = X (10) 

hx^O 

2. \im X{h^) = <j^V + ju^. (11) 

hx — >00 

3. E{X{h^)) = /d^ and E{{X{h^) - /d^f) = rr^, forVA^. (12) 



Property 1 shows thatXean be approximated by X(Ax) with small Ax(< 0.5). Property 2 
indicates that X{hx), or any other distribution with a large bandwidth, is almost a normal 
distribution. In particular, all such modified distributions are (almost) of the same shape. 
Property 3 certifies that the mean and standard deviation ofX(hx) will never change. 

Similarly, GhXy), the CDF of Y on population Q, Hp_hAjia), the CDF of A on population P, 
and E[Q_hAj^a), the CDF of A on population Q, can be defined with given bandwidths hr, and 
Ha^, respectively. 

Computation of the Chained Equating Function eY(CE)(x) 

The computation of the chained equating function ey(CE)(x) is: 



-YiCE) 



(x) = 









(13) 



When both Ay and hAp are small, by Property 1 ofX(hx) (andy4(/i^)), the composition 
function E[p^^ (x)) is an approximation of the equipercentile equating function from X to A. 

For very large hx and hAp, since the density functions for both E[p (a) and (x) are normal 
density functions by Property 2, Ffp (F’,,^ (x)) becomes a linear equating (von Davier et ah, 
2004, p. 12, Theorem 1.1.), which has the form: 



5 




a = 



(Ja 



(Tv 






(14) 



The same arguments apply to test Y with anehorv4g. Henee, for small hx , hr, hAp, and 
hAg, eY(CE)(x) is the ehained equipereentile equating, but for large bandwidths, it is the ehained 
linear equating, whose funetion is: 



y = Lin^ECx) = 
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(15) 



Constructing a Chained True Score Equipereentile Equating 
via the Kernel Equating (KE) Framework 

From the previous diseussion of ehained equipereentile equating, it is easy to see how to 
eonstruet a CTSEE; that is, to make the linkings on the true scores under the assumption that the 
linking from Ixto Ta and from Ta to Eyare both population invariant. Now the problem is how 
to get the true scores when an observed score distribution is given. Theoretically, the problem is 
unsolvable. In practice, however, there are many ways to approximate the true scores. 

To find a true score Xt for X when otx is given, the basic criteria are: 



1. E{Xp) = 

2. E{{X,-^,f) = al. 



(16) 

(17) 



By the discussion in the previous section on KE, the goal is to construct a distribution defined 
in (18): 



Xj,{hj^) = Up {X + h^V) + bp , 



(18) 



for a given number hx, and solve (18) through (20) for both and bj 



E{Xp{hp,)) = 

E{{Xp{h,)-^J) = al. 



(19) 

( 20 ) 



Substitute (18) into (19) to get 
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E{^Xj {hj ^ )) Ei^Uj.^ + Ei^Uj.^ E{bj^ ) Uj^ fj,^ + 0 + bj.^ 



set = 



> ( 21 ) 



Hence, 



br,= Ex- (22) 

Substitute (18) and (22) into (20) to get 

E{{X^{h^) - pi^f) = E{{aj^{X + h^V) + {\-aj^)pi^ - pi^f) 

= E{{a,^{X-^^+h^V)f) 

= al{E{{X-^,f) + E{{h^Vf)} 

= alial+h/) (23) 



The solution for qt^ is 



Clj^x 




(24) 



Since hx is an arbitrary number, a family of true scores {Xj{hx)} related to X is created. 
Substitute (22) and (24) into (18), with the properties that lim = 0 and lim Uj = <3 j , to 

>00 /2^_^oo ^ ^ 

get 



\\m X^{h^) ^ CT^ V + /u^. (25) 

Families of true scores can also be made forHp, Y, andHg, with similar properties in (19), (20), 
and (25), respectively. Then, for very large bandwidths, the CTSEE is virtually the ETSE. The 
result can be stated as a theorem: 

Theorem. Eor a NEAT design having test X with anchor Hp, test Y with anchor Aq, let T() be the 
true score of, and h() be the bandwidth associated with the specified test, jU()_ and oq be the mean 
and standard deviation of the labeled distribution, if in the continuization step of KE process, the 
following is defined: 
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Let 



Fhxp (^) = Prob(X^ (hgg ) < x) 

Ghrp(y) = 'P^oHYT(hy) < y) 

Hp,hApg («) = Yroh{Ap p{h^^ ) < a) 

G^QMgj ^ Y^o\){Aq p{h ^ ) < a). 



and then for the equating funetion: 



"r(r)' 



(x) = 



Aqt 



PAa. 



(26) 



with large bandwidths (preferably no less than 30 times of the related standard deviations, 
respectively), it follows that 



e^(^)(x) = LTSE(x). 



(27) 



The ey^j-^{x)'s are called CTSEE functions. 

Example and Discussion 

Any equating method can be decomposed as a linear part and a nonlinear part. The 
decomposition will be quite natural if using the KE framework. The original equating in KE (i.e., 
with quite small bandwidths) is a sum of the linear portion (i.e., with very large bandwidths) and 
the remainder, which can be written as: 
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ey(x) = Liiig (x) + Rsdg (x), 



(28) 



where Rsd^^ (x) = ey(x) - Liiig^ (x), is the residue of the equating funetion minus its linear 

portion. There is a belief that if two equating methods are related but different, the majority of 
their differenees lies in their linear portions. This idea led to the development of the hybrid 
Levine equipereentile equating (von Davier et ah, 2006). So if one method is to replaee another 
method under some eonditions for data, the differenee between these two methods is expected to 
be mainly in their linear portions. The following example, (29), checks to see if it is true for both 
CE and CTSEE. 

The design and data sets are given in Chapter 10 of von Davier et al. (2004). TestXhas 
78 items with external anchor which has 35 items, and test Thas 78 items with anchor = 
Ap. Using the same notation as before. 



and 



= 39.2515, jUy = 32.6866,//^^ = 17.0540,//^^ = 14.3864, 
o-j, = 17.2252, o-j, = 16.7271, = 8.3329, = 8.2082, 

Cov{X,Ap) = Cov{T^,T^^) = = 126.4198, 

Cov{Y,AQ) = Cov{Ty,T^) = =120.0982, (29) 



where Py^ y^ is the correlation coefficient of 7> and Ta^. Next, calculate criy, and so on, using the 

formulas in Brennan (1990) with the assumptions of the classical congeneric model, in 
particular, that both Py^ y^ and Py^ y^ = 1 . Eater, this paper will cover the case that at least one 

of them < 1 . 

Using the numbers in (29) and the formula in Brennan (1990; with external Ap): 



Cry 



I Cov(X,Ay) + a^ 
]jcov(X,Ay) + al^ 



Cov(X,Ap) 



1126.4198 + 17.2252" 
V 126.4198 + 8.3329" 



126.4198 = V273.1159 = 16.5262. 



(30) 



Similarly, 
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( 31 ) 



<jj = 16.0056, <jj = 7.6497, cr^. = 7.5035. 

h ^Ap ‘aq 

Substituting suitable numbers from (29) into (15) results in 

LincE(x) = 0.98584X - 0.57295. (32) 

Substituting the suitable numbers from (29) through (31) into (4) results in 

LTSE(x) = 0.98736X - 0.37874. (33) 



Henee, 



LTSE(x) -LincE(x) = 0.00152x + 0.1942E (34) 

Elsing (28) and (34) and the faet that the linear part of CTSEE is ETSE, the differenee between 
CTSEE and CE ean be written as 



CTSEE(x) - CE(x) = (ETSE(x) + Rsd^sEE W) ■ l^sd(;,]j(x)) 

= ETSE(x) - Ein(,g(x) + Rsd^sEEC-^) ‘ Rsd(;,g(x) 

= 0.00152x + 0.19421 + Rsd^sEEW ■ Rsd(,g(x). (35) 

Elsing the data above, CTSEE(x) and CE(x), both with bandwidths = 0.5, and ETSE(x) 
and EincE(x), both with bandwidths = 500, were eomputed with KE Software, whieh is eurrently 
under development at ETS. Both ETSE(x) - EincE(x) and RsdcxsEE(.^) - RsdcE(.^) are plotted in 
Eigure 1. 

It is obvious that ETSE(x) - EincE(x) has some bias, while RsdcrsEE(x) - RsdcE(x) has 
almost none. Simple eomputations show that Mean(ETSE(x) - EincE(.^)) = 0.254, and 
Mean(RsdcTSEE(.^) - RsdcE(.^)) = 0.059. The data sets are highly eorrelated (both eorrelation 
eoeffieients are in the range of 0.87-0.88), and ETSE(x) is quite elose to EincE(x). Otherwise, 
Mean(ETSE(x) - EincE(x)) would be mueh bigger. Both ETSE(x) and EincE(x) as eomputed by 
the software agree with (33) and (32), respeetively, up to four deeimal plaees. 

Computing and so on, is impossible without additional assumptions. By using the 
elassieal eongenerie model, whieh assumes that the true seores between the main test and its 
anehor are perfeetly eorrelated, it is purely technieal to eompute the values of the terms in both 
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(30) and (31). In general, making such an assumption will contradict the purpose of CTSEE. 
However, in practice, for most equating cases, the correlation coefficient of two related true 
scores > 0.99, which is demonstrated in Eigure 2. 




Figure 1. The linear difference and the residue (Rsd) difference between chained true score 
equipercentile equating (CTSEE) and chained equating (CE). 




Figure 2. Computing the correlation coefficient for a curvilinear function. 
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Both X and Y are assumed to be true seores, with Y a funetion ofX, as shown in Figure 2, 
whieh may represent a general pattern of an equating funetion that is not linear. By fitting a line 
to the curve, pj, j, = R = Vo.991 = 0.9959. 

For the case that the correlation coefficient of true scores is smaller than 0.99, an iterating 
process will be used. First, (30) is used to compute crTy, and so on. Secondly, a true score 
equipercentile equating from J^to Tap will be constructed, using previously computed crzy and 
err. , resulting in a function from J^to Ta^- Hence j can be computed. Then crr„ is 

-r ^ Y. Ap yi. 



computed again, but replacing Cov(X,Ap) with 



Cov{X,Ap) 

Ptx^Ap 



in (30). This process continues 



until Pj^ converges. However, whether Pj^ will converge or under what conditions it will 
converge are not the topics in this paper. 

What happens to XjQix) when h ^ 0? ft is apparent that becomes px defined in (1), so 



lim Xp Qipp ) = lim (a (X + /i^F) + (1 - a ) Px) = Px^ Px) Px^ 



(36) 



which is the squeezing process to replace the original score distribution by its true scores, 
proposed by Brennan and Lee (2006) and used by Wang and Brennan (2007) on the anchor 
marginal distributions for their modified PSE method. 

Conclusion 

CTSEE extends ETSE so that equating on true scores is not a linear function, just as 
chained equipercentile equating extends chained linear equating on observed scores. Under the 
KE framework, CTSEE and ETSE can be connected naturally by varying the values of the 
bandwidths. Additional computations for CTSEE are needed and can be done with the help of 
the classical congeneric model, although sometimes adjustments are needed. 

Just like ETSE, CTSEE only equates true scores. This makes it less practical than other 
curvilinear equating methods. However, this new approach opens the field. More studies on this 
method will reveal properties so far unknown to researchers and practitioners, leading to the 
improvements of this method, as well as the developments of new methods. 
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Notes 

' The work that led to this paper was eollaborative in every respeet and the order of authorship is 
alphabetieal. 



15 




